{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from collections import Counter, defaultdict\n",
    "import os\n",
    "\n",
    "from nltk.tag import pos_tag\n",
    "\n",
    "from gtnlplib import coref, coref_rules, coref_features, coref_learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 1: Exploring the data\n",
    "\n",
    "The core data is in the form of \"markables\", which refer to token sequences that can participate in coreference relations.\n",
    "\n",
    "Each markable has four elements:\n",
    "- ```string```, which is a list of tokens\n",
    "- ```entity```, which defines the ground truth assignments\n",
    "- ```start_token```, the index of the first token in the markable with respect to the entire document\n",
    "- ```end_token```, one plus the index of the last token in the markable\n",
    "\n",
    "The ```read_data``` function also returns a list of tokens. \n",
    "You can use this to incorporate the linguistic context around each markable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dv_dir = os.path.join('data','dev')\n",
    "tr_dir = os.path.join('data','tr')\n",
    "te_dir = os.path.join('data','te-hidden-labels')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "markables,words = coref.read_data('Johnston Atoll',basedir=tr_dir)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'end_token': 21, 'start_token': 19, 'string': ['The', 'atoll'], 'entity': u'set_76'}\n",
      "['The', 'atoll']\n"
     ]
    }
   ],
   "source": [
    "print markables[3]\n",
    "print words[markables[3]['start_token']:markables[3]['end_token']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 1.1**: Write a function that returns all the markable **strings** associated with a given entity. Specifically, fill in the function ```get_markables_for_entity``` in ```coref.py```.\n",
    "(0.5 pts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Johnston and Sand Island',\n",
       " 'Johnston and Sand islands',\n",
       " 'The islands',\n",
       " 'the area',\n",
       " 'the islands',\n",
       " 'them']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reload(coref);\n",
    "sorted(coref.get_markables_for_entity(markables,'set_100'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 1.2** Write a function that takes as input a string, and returns a list of distances to the most recent ground truth antecedent for every time the input string appears. For example, if the input is \"they\", it should make a list with one element for each time the word \"they\" appears in the list of markables. Each element should be the distance of the word \"they\" to the nearest previous mention of the entity that \"they\" references.\n",
    "\n",
    "Fill in the function ```get_distances_for_term``` in ```coref.py```. If the input string is not anaphoric, the distance should be zero. Note that input strings may contain spaces. You may use any other function in ```coref.py``` to help you. (0.5 pts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2, 2, 1, 2]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coref.get_distances(markables,'they')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's compare the typical distances for various mention types.\n",
    "\n",
    "You can see the most frequent mention types by using the Counter class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('Johnston Atoll', 13),\n",
       " ('the atoll', 13),\n",
       " ('the island', 11),\n",
       " ('it', 8),\n",
       " ('Johnston Island', 5)]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Counter([' '.join(markable['string']) for markable in markables]).most_common(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 4, 10, 3, 3, 1, 4, 3, 3, 1, 4, 7, 7]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coref.get_distances(markables,'Johnston Atoll')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2, 4, 1, 2, 3, 7, 3, 6, 2, 3, 1, 2]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coref.get_distances(markables,'the island')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[3, 1, 3, 1, 2, 1, 3, 1, 1, 1, 2]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coref.get_distances(markables,'it')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Rule-based coreference resolution\n",
    "\n",
    "We have written a simple coreference classifier, which predicts that each markable is linked to the most recent antecedent which is an exact string match.\n",
    "\n",
    "The code block below applies this method to the dev set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "exact_matcher = coref_rules.make_resolver(coref_rules.exact_match)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code above has two pieces:\n",
    "\n",
    "- ```coref_rules.exact_match``` is a function that takes two markables, and returns True iff they are an exact (case-insensitive) string match\n",
    "- ```make_resolver``` is a function that takes a matching function, and returns a function that computes an antecedent list for a list of markables.\n",
    "\n",
    "Let's run it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 11, 12, 13, 3, 15, 16, 17, 18, 19]\n"
     ]
    }
   ],
   "source": [
    "ant_exact = exact_matcher(markables)\n",
    "print ant_exact[:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output is a list of antecedent numbers, $c_i$. \n",
    "When $c_i = i$, the markable $i$ has no antecedent: it is the first mention of its entity.\n",
    "\n",
    "We can test whether these predictions are correct by comparing against the key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ant_true = coref.get_true_antecedents(markables)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "correct: 76\taccuracy: 0.353\n"
     ]
    }
   ],
   "source": [
    "num_correct = sum([c_true==c_predict for c_true,c_predict in zip(ant_true,ant_exact)])\n",
    "acc = num_correct/float(len(markables))\n",
    "print \"correct: %d\\taccuracy: %.3f\"%(num_correct,acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation\n",
    "\n",
    "Coreference is typically evaluated in terms of recall, precision, and F-measure. Here is how we will define these terms:\n",
    "\n",
    "- **True positive**: The system predicts $\\hat{c}_i < i$, and $\\hat{c}_i$ and $i$ are references to the same entity.\n",
    "- **False positive**: The system predicts $\\hat{c}_i < i$, but $\\hat{c}_i$ and $i$ are not references to the same entity.\n",
    "- **False negative**: There exists some $c_i < i$ such that $c_i$ and $i$ are references to the same entity, but the system predicts either $\\hat{c}_i = i$, or some $\\hat{c}_i$ which is not really a reference to the same entity that $i$ references.\n",
    "- Recall = $\\frac{tp}{tp + fn}$\n",
    "- Precision = $\\frac{tp}{tp + fp}$\n",
    "- F-measure = $\\frac{2RP}{R+P}$\n",
    "\n",
    "A couple of things to notice here:\n",
    "\n",
    "- There is no reward for correctly identifying a markable as non-anaphoric (not having any antecedent), but you do avoid committing a false positive by doing this.\n",
    "- You cannot compute the evaluation by directly matching the predicted antecedents to the true antecedents. Suppose the truth is $a \\leftarrow b, b \\leftarrow c$, but the system predicts $a \\leftarrow b, a \\leftarrow c$: the system should receive two true positives, since $a$ and $c$ are references to the same entity in the ground truth.\n",
    "\n",
    "**Deliverable 2.1** Implement `get_tp`, `get_fp`, and `get_fn` in ```coref.py```. You will want to use the function ```coref.get_entities```.  (1 point)\n",
    "\n",
    "**NOTE!** You **must** successfully complete this deliverable. Otherwise, some of the unit tests won't work and you won't be able to complete the rest of the assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.619607843137 0.473053892216 0.897727272727\n"
     ]
    }
   ],
   "source": [
    "f,r,p = coref.evaluate(exact_matcher,markables)\n",
    "print f,r,p"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "all_markables,all_words = coref.read_dataset(tr_dir)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6608\tR: 0.5259\tP:0.8886\n"
     ]
    }
   ],
   "source": [
    "coref.eval_on_dataset(exact_matcher,all_markables);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Increasing precision\n",
    "\n",
    "The ```exact_match``` function matches everything, including pronouns. This can lead to mistakes:\n",
    "\n",
    "\"Umashanthi ate pizza until she was full. Parvati kept eating until she had a stomach ache.\"\n",
    "\n",
    "In this example, both pronouns likely refer to the names that immediately precede them, and not to each other.\n",
    "\n",
    "**Deliverable 2.2** The file ```coref_rules.py``` contains the signature for a function ```exact_match_no_pronoun```, which solves this problem by only predicting matches between markables that are not pronouns. Implement and test this function. For now, you may use the list of pronouns provided in the code file ```coref_rules.py```.\n",
    "\n",
    "(0.5 points 4650 / 0.25 points 7650)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "reload(coref_rules);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "no_pro_matcher = coref_rules.make_resolver(coref_rules.exact_match_no_pronouns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6419\tR: 0.4868\tP:0.9421\n"
     ]
    }
   ],
   "source": [
    "f,r,p = coref.eval_on_dataset(no_pro_matcher,all_markables);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Precision has increased, but recall decreased, dragging down the overall F-measure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Increasing recall\n",
    "\n",
    "Our current matcher is very conservative. Let's try to increase recall. One solution is match on the **head word** of each markable. \n",
    "\n",
    "As you know, in a CFG parse, the head word is defined by a set of rules: for example, the head of a determiner-noun construction is the noun. In a dependency parse, the head word would be the root of the subtree governing the markable span. But this assumes that the markables correspond to syntactic constituents or dependency subtrees. This is not guaranteed to be true -- particularly when there are parsing errors.\n",
    "\n",
    "**Deliverable 2.3** Let's start with a much simpler head-finding heuristic: simply select the last word in the markable. This handles many cases --- but as we will see, not all. To do this, implement the function ```match_last_token``` in ```coref_rules.py```. This function should not match pronouns, but should match all other cases where the final tokens match. (0.5 points 4650 / 0.25 points 7650)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "reload(coref_rules);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "last_tok_matcher = coref_rules.make_resolver(coref_rules.match_last_token)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6482\tR: 0.5959\tP:0.7105\n"
     ]
    }
   ],
   "source": [
    "coref.eval_on_dataset(last_tok_matcher,all_markables);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recall is up, but precision is back down. To try to increase precision, let's add one more rule: two markables cannot coref if their spans overlap. This can happen with nested mentions, such as \"(the president (of the united states))\". Under our last-token rule, these two mentions would co-refer, but logically, overlapping markables cannot refer to the same entity. \n",
    "\n",
    "**Deliverable 2.4** Fill in the function ```match_last_token_no_overlap```, which should match any two markables that share the same last token, unless their spans overlap. Use the ```start_token``` and ```end_token``` fields of each markable to determine whether they overlap. (0.5 points / 0.25 points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "reload(coref_rules);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6723\tR: 0.6108\tP:0.7476\n"
     ]
    }
   ],
   "source": [
    "coref.eval_on_dataset(coref_rules.make_resolver(coref_rules.match_last_token_no_overlap),all_markables);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both recall and precision increase. Why would recall increase? The restriction does not create any new coreference links, but it changes some incorrect links to correct links. This increases the number of true positives and reduces the number of false negatives."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Error analysis\n",
    "\n",
    "To see whether we can do even better, let's try some error analysis on a specific file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# predicted antecedent series\n",
    "ant = coref_rules.make_resolver(coref_rules.match_last_token_no_overlap)(markables)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# let's look at large entities\n",
    "m2e,e2m = coref.markables_to_entities(markables,ant)\n",
    "big_entities = [ent for ent,vals in e2m.iteritems() if len(vals)>20]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entity 0: 31 mentions\n",
      "['Johnston Atoll', 'The atoll', 'Johnston Atoll', 'the atoll', 'the atoll', 'the atoll', 'Johnston Atoll', 'the atoll', 'Johnston Atoll', 'the atoll', 'the atoll', 'the atoll', 'Johnston Atoll', 'Johnston Atoll', 'Johnston Atoll', 'The atoll', 'Johnston Atoll', 'Johnston Atoll', 'the atoll', 'the atoll', 'the deserted atoll', 'the Atoll', 'Johnston Atoll', 'the atoll', 'Johnston Atoll', 'Johnston Atoll', 'Johnston Atoll', 'the atoll', 'the atoll', 'Seabird species recorded as breeding on the atoll', 'the atoll']\n",
      "\n",
      "Entity 22: 21 mentions\n",
      "['Sand Island', 'the island', 'Sand Island', 'Johnston Island', 'Johnston Island', 'Johnston Island', 'the island', 'the island', 'Johnston Island', 'the island', 'the island', 'Johnston Island', 'the island', 'the island', 'the island', 'the island', 'The island', 'The central means of transportation to this island', 'this island', 'the island', 'the island']\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for entity in big_entities:\n",
    "    print 'Entity %d: %d mentions'%(entity,len(e2m[entity]))\n",
    "    print [' '.join(markables[idx]['string']) for idx in e2m[entity]]\n",
    "    print"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Incorporating parts of speech\n",
    "\n",
    "One clear mistake is that we are matching \"Sand Island\" to \"Johnston Island\". The last token heuristic is the culprit: in this case, the first token is a key disambiguator. Let's try a more syntactically-motivated approach. \n",
    "\n",
    "Instead of matching the last token (low precision) or matching on all tokens (low recall), let's try matching on all *content* words. Let's start by including only the following grammatical categories:\n",
    "\n",
    "- Nouns (proper, common, singular, plural)\n",
    "- Pronouns (including possessive)\n",
    "- Cardinal numbers\n",
    "\n",
    "To get these categories, we can call ```read_dataset``` with an optional argument, a part of speech tagger. We'll use NLTK for this project, which has a structured perceptron tagger on the [PTB tagset](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "all_markables,_ = coref.read_dataset(tr_dir,tagger=pos_tag)\n",
    "all_markables_dev,_ = coref.read_dataset(dv_dir,tagger=pos_tag)\n",
    "all_markables_te,_ = coref.read_dataset(te_dir,tagger=pos_tag)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'end_token': 30,\n",
       " 'entity': u'set_8',\n",
       " 'start_token': 26,\n",
       " 'string': ['the', 'coral', 'reef', 'platform'],\n",
       " 'tags': ['DT', 'JJ', 'NN', 'NN']}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_markables[7][4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the markables now contain an additional ```tags``` field, with the part of speech tags for each token in the 'string' field.\n",
    "\n",
    "**Deliverable 2.5** Now implement a new matcher, ```match_on_content``` in ```coref_rules.py```. Your code should match $m_a$ and $m_i$ iff all content words are identical. It should also enforce the \"no overlap\" restriction defined above. (0.5 points 4650 / 0.25 points 7650)\n",
    "\n",
    "Run the cells below to run on the dev and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6897\tR: 0.5783\tP:0.8545\n"
     ]
    }
   ],
   "source": [
    "coref.eval_on_dataset(coref_rules.make_resolver(coref_rules.match_on_content),all_markables);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 2.6** Run the code blocks below to output predictions for the dev and test data. (0.25 points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "coref.write_predictions(coref_rules.make_resolver(coref_rules.match_on_content),\n",
    "                        all_markables_dev,\n",
    "                        'predictions/rules-dev.preds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6830\tR: 0.5889\tP:0.8130\n",
      "0.683029453015\n"
     ]
    }
   ],
   "source": [
    "f,r,p = coref.eval_predictions('predictions/rules-dev.preds',all_markables_dev);\n",
    "print f"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "coref.write_predictions(coref_rules.make_resolver(coref_rules.match_on_content),\n",
    "                        all_markables_te,\n",
    "                        'predictions/rules-test.preds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "all_markables_te_secret,_ = coref.read_dataset('data/te')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6796\tR: 0.5770\tP:0.8266\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(0.6796296296296297, 0.5770440251572327, 0.8265765765765766)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# students can't run this\n",
    "coref.eval_predictions('predictions/rules-test.preds',all_markables_te_secret)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 3: Machine learning for coreference resolution\n",
    "\n",
    "You will now implement coreference resolution using the mention-ranking model. Let's start by implementing some features.\n",
    "\n",
    "**Deliverable 3.1** Implement `coref_features.minimal_features`, using the rules you wrote from `coref_rules.` This should be a function that takes a list of markables, and indices for two mentions, and returns a dict with features and counts. Include the following features:\n",
    "\n",
    "- `exact_match`\n",
    "- `last_token_match`\n",
    "- `content_match`\n",
    "- `cross_over`: value of 1 iff the mentions overlap\n",
    "- `new_entity`: value of 1 iff i=j\n",
    "\n",
    "For the first four features, you should call your code from coref_rules directly. (1 point)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "reload(coref_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 {'end_token': 2, 'start_token': 0, 'tags': ['NNP', 'NNP'], 'string': ['Johnston', 'Atoll'], 'entity': u'set_76'}\n",
      "1 {'end_token': 12, 'start_token': 10, 'tags': ['NNP', 'NNP'], 'string': ['Pacific', 'Ocean'], 'entity': u'set_3'}\n",
      "2 {'end_token': 18, 'start_token': 17, 'tags': ['NNP'], 'string': ['Hawaii'], 'entity': u'set_107'}\n",
      "3 {'end_token': 21, 'start_token': 19, 'tags': ['DT', 'NN'], 'string': ['The', 'atoll'], 'entity': u'set_76'}\n",
      "4 {'end_token': 30, 'start_token': 26, 'tags': ['DT', 'JJ', 'NN', 'NN'], 'string': ['the', 'coral', 'reef', 'platform'], 'entity': u'set_8'}\n",
      "5 {'end_token': 34, 'start_token': 32, 'tags': ['CD', 'NNS'], 'string': ['four', 'islands'], 'entity': u'set_10000'}\n",
      "6 {'end_token': 36, 'start_token': 35, 'tags': ['NNP'], 'string': ['Johnston'], 'entity': u'set_76'}\n",
      "7 {'end_token': 39, 'start_token': 35, 'tags': ['NNP', 'CC', 'NNP', 'NNS'], 'string': ['Johnston', 'and', 'Sand', 'islands'], 'entity': u'set_100'}\n",
      "8 {'end_token': 39, 'start_token': 37, 'tags': ['NNP', 'NNS'], 'string': ['Sand', 'islands'], 'entity': u'set_83'}\n",
      "9 {'end_token': 55, 'start_token': 46, 'tags': ['NNP', 'NNP', 'NNP', 'NNP', 'CC', 'NNP', 'NNP', 'NNP', 'NNP'], 'string': ['North', '-LRB-', 'Akau', '-RRB-', 'and', 'East', '-LRB-', 'Hikina', '-RRB-'], 'entity': u'set_10'}\n",
      "10 {'end_token': 66, 'start_token': 64, 'tags': ['NNP', 'NNP'], 'string': ['Johnston', 'Atoll'], 'entity': u'set_76'}\n",
      "11 {'end_token': 77, 'start_token': 69, 'tags': ['CD', 'IN', 'DT', 'NNP', 'NNPS', 'NNP', 'NNP', 'NNP'], 'string': ['one', 'of', 'the', 'United', 'States', 'Minor', 'Outlying', 'Islands'], 'entity': u'set_76'}\n",
      "12 {'end_token': 74, 'start_token': 72, 'tags': ['NNP', 'NNPS'], 'string': ['United', 'States'], 'entity': u'set_108'}\n",
      "13 {'end_token': 82, 'start_token': 79, 'tags': ['RB', 'CD', 'NNS'], 'string': ['nearly', '70', 'years'], 'entity': u'set_71'}\n",
      "14 {'end_token': 85, 'start_token': 83, 'tags': ['DT', 'NN'], 'string': ['the', 'atoll'], 'entity': u'set_76'}\n"
     ]
    }
   ],
   "source": [
    "for i,markable in enumerate(all_markables[7][:15]):\n",
    "    print i,markable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{}\n",
      "{'new-entity': 1.0}\n",
      "{'last-token-match': 1}\n",
      "{'crossover': 1}\n",
      "{'exact-match': 1, 'last-token-match': 1, 'content-match': 1}\n"
     ]
    }
   ],
   "source": [
    "print coref_features.minimal_features(all_markables[7],0,1)\n",
    "print coref_features.minimal_features(all_markables[7],1,1)\n",
    "print coref_features.minimal_features(all_markables[7],0,3)\n",
    "print coref_features.minimal_features(all_markables[7],6,7)\n",
    "print coref_features.minimal_features(all_markables[7],3,14)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.2** Implement `coref_learning.mention_rank`, which should select the highest-scoring antecedent for each markable. (1 points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "reload(coref_learning);\n",
    "reload(coref_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "0\n"
     ]
    }
   ],
   "source": [
    "hand_weights = defaultdict(float,\n",
    "                           {'new-entity':0.5,\n",
    "                           'last-token-match':0.6,\n",
    "                            'content-match':0.7,\n",
    "                            'exact-match':1.}\n",
    "                          )\n",
    "print coref_learning.mention_rank(all_markables[3],1,coref_features.minimal_features,hand_weights)\n",
    "print coref_learning.mention_rank(all_markables[3],7,coref_features.minimal_features,hand_weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.3** Now implement `coref_learning.compute_instance_update`, which compute a perceptron update for instance $i$. (0.5 points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 {'end_token': 2, 'start_token': 0, 'tags': ['NNP', 'NNP'], 'string': ['Johnston', 'Atoll'], 'entity': u'set_76'}\n",
      "1 {'end_token': 12, 'start_token': 10, 'tags': ['NNP', 'NNP'], 'string': ['Pacific', 'Ocean'], 'entity': u'set_3'}\n",
      "2 {'end_token': 18, 'start_token': 17, 'tags': ['NNP'], 'string': ['Hawaii'], 'entity': u'set_107'}\n",
      "3 {'end_token': 21, 'start_token': 19, 'tags': ['DT', 'NN'], 'string': ['The', 'atoll'], 'entity': u'set_76'}\n",
      "4 {'end_token': 30, 'start_token': 26, 'tags': ['DT', 'JJ', 'NN', 'NN'], 'string': ['the', 'coral', 'reef', 'platform'], 'entity': u'set_8'}\n",
      "5 {'end_token': 34, 'start_token': 32, 'tags': ['CD', 'NNS'], 'string': ['four', 'islands'], 'entity': u'set_10000'}\n",
      "6 {'end_token': 36, 'start_token': 35, 'tags': ['NNP'], 'string': ['Johnston'], 'entity': u'set_76'}\n",
      "7 {'end_token': 39, 'start_token': 35, 'tags': ['NNP', 'CC', 'NNP', 'NNS'], 'string': ['Johnston', 'and', 'Sand', 'islands'], 'entity': u'set_100'}\n",
      "8 {'end_token': 39, 'start_token': 37, 'tags': ['NNP', 'NNS'], 'string': ['Sand', 'islands'], 'entity': u'set_83'}\n",
      "9 {'end_token': 55, 'start_token': 46, 'tags': ['NNP', 'NNP', 'NNP', 'NNP', 'CC', 'NNP', 'NNP', 'NNP', 'NNP'], 'string': ['North', '-LRB-', 'Akau', '-RRB-', 'and', 'East', '-LRB-', 'Hikina', '-RRB-'], 'entity': u'set_10'}\n",
      "10 {'end_token': 66, 'start_token': 64, 'tags': ['NNP', 'NNP'], 'string': ['Johnston', 'Atoll'], 'entity': u'set_76'}\n",
      "11 {'end_token': 77, 'start_token': 69, 'tags': ['CD', 'IN', 'DT', 'NNP', 'NNPS', 'NNP', 'NNP', 'NNP'], 'string': ['one', 'of', 'the', 'United', 'States', 'Minor', 'Outlying', 'Islands'], 'entity': u'set_76'}\n",
      "12 {'end_token': 74, 'start_token': 72, 'tags': ['NNP', 'NNPS'], 'string': ['United', 'States'], 'entity': u'set_108'}\n",
      "13 {'end_token': 82, 'start_token': 79, 'tags': ['RB', 'CD', 'NNS'], 'string': ['nearly', '70', 'years'], 'entity': u'set_71'}\n",
      "14 {'end_token': 85, 'start_token': 83, 'tags': ['DT', 'NN'], 'string': ['the', 'atoll'], 'entity': u'set_76'}\n",
      "15 {'end_token': 92, 'start_token': 91, 'tags': ['JJ'], 'string': ['American'], 'entity': u'set_108'}\n",
      "16 {'end_token': 97, 'start_token': 95, 'tags': ['DT', 'NN'], 'string': ['that', 'time'], 'entity': u'set_71'}\n",
      "17 {'end_token': 98, 'start_token': 97, 'tags': ['PRP'], 'string': ['it'], 'entity': u'set_76'}\n"
     ]
    }
   ],
   "source": [
    "for i,markable in enumerate(all_markables[7][:18]):\n",
    "    print i,markable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prediction: 3\n",
      "update at a=3: {}\n",
      "update at a=10: {}\n",
      "update at a=12: defaultdict(<type 'float'>, {'exact-match': -1.0, 'last-token-match': -1.0, 'content-match': -1.0})\n",
      "update at a=4: defaultdict(<type 'float'>, {'exact-match': -1.0, 'last-token-match': -1.0, 'content-match': -1.0})\n"
     ]
    }
   ],
   "source": [
    "print \"prediction:\",coref_learning.mention_rank(all_markables[7],14,coref_features.minimal_features,hand_weights)\n",
    "print \"update at a=3:\",coref_learning.compute_instance_update(all_markables[7],14,3,coref_features.minimal_features,hand_weights)\n",
    "print \"update at a=10:\",coref_learning.compute_instance_update(all_markables[7],14,10,coref_features.minimal_features,hand_weights)\n",
    "print \"update at a=12:\",coref_learning.compute_instance_update(all_markables[7],14,12,coref_features.minimal_features,hand_weights)\n",
    "print \"update at a=4:\",coref_learning.compute_instance_update(all_markables[7],14,1,coref_features.minimal_features,hand_weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.4** You are now ready to implement `coref_learning.train_avg_perceptron`\n",
    "\n",
    "You can probably get away with \"naive\" weight averaging, unless you want to go crazy with features later.\n",
    "\n",
    "Make sure that your running total of weights gets updated after each markable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "reload(coref_features);\n",
    "reload(coref_learning);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 2\n"
     ]
    }
   ],
   "source": [
    "theta_simple = coref_learning.train_avg_perceptron([all_markables[0][:10]],coref_features.minimal_features,N_its=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defaultdict(float,\n",
       "            {'content-match': 0.6,\n",
       "             'crossover': 0.0,\n",
       "             'exact-match': 0.6,\n",
       "             'last-token-match': 0.6,\n",
       "             'new-entity': 0.2})"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "theta_simple[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1126 1126 1126 1126 1126\n"
     ]
    }
   ],
   "source": [
    "theta_hist = coref_learning.train_avg_perceptron(all_markables,coref_features.minimal_features,N_its=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6954\tR: 0.5849\tP:0.8575\n",
      "F: 0.6954\tR: 0.5849\tP:0.8575\n",
      "F: 0.6954\tR: 0.5849\tP:0.8575\n",
      "F: 0.6954\tR: 0.5849\tP:0.8575\n",
      "F: 0.6954\tR: 0.5849\tP:0.8575\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables,theta_hist,coref_features.minimal_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defaultdict(float,\n",
       "            {'content-match': 0.6115865701119158,\n",
       "             'crossover': -0.6398946675444371,\n",
       "             'exact-match': 0.5365371955233706,\n",
       "             'last-token-match': 0.2577353522053983,\n",
       "             'new-entity': 0.4427254772876893})"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "theta_hist[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Already pretty competitive with the rule-based alternatives, at least on the training set. Let's run on the dev set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6671\tR: 0.5756\tP:0.7933\n",
      "F: 0.6671\tR: 0.5756\tP:0.7933\n",
      "F: 0.6671\tR: 0.5756\tP:0.7933\n",
      "F: 0.6671\tR: 0.5756\tP:0.7933\n",
      "F: 0.6671\tR: 0.5756\tP:0.7933\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables_dev,theta_hist,coref_features.minimal_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# run this block to output your predictions\n",
    "coref.write_predictions(coref_learning.make_resolver(coref_features.minimal_features,\n",
    "                                                    theta_hist[-1]),\n",
    "                        all_markables_dev,\n",
    "                        'predictions/minimal-dev.preds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6671\tR: 0.5756\tP:0.7933\n"
     ]
    }
   ],
   "source": [
    "coref.eval_predictions('predictions/minimal-dev.preds',all_markables_dev);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.5** Implement distance features in `coref_features.distance_features`, measuring the mention distance and the token distance. Specifically:\n",
    "\n",
    "- **Mention distance** is number of intervening mentions between i and j, $i-j$.\n",
    "- **Token distance** is number of tokens between the start of i and the end of j.\n",
    "\n",
    "These should be binary features, up to a maximum distance of 10, with the final feature indicating distance of 10 and above. The desired behavior is shown below. (0.25 points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "reload(coref_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 {'end_token': 2, 'start_token': 0, 'tags': ['NNP', 'NNP'], 'string': ['Johnston', 'Atoll'], 'entity': u'set_76'}\n",
      "1 {'end_token': 12, 'start_token': 10, 'tags': ['NNP', 'NNP'], 'string': ['Pacific', 'Ocean'], 'entity': u'set_3'}\n",
      "2 {'end_token': 18, 'start_token': 17, 'tags': ['NNP'], 'string': ['Hawaii'], 'entity': u'set_107'}\n",
      "3 {'end_token': 21, 'start_token': 19, 'tags': ['DT', 'NN'], 'string': ['The', 'atoll'], 'entity': u'set_76'}\n"
     ]
    }
   ],
   "source": [
    "for i,markable_i in enumerate(all_markables[7][:4]):\n",
    "    print i,markable_i"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{}\n",
      "{'token-distance-8': 1, 'mention-distance-1': 1}\n",
      "{'token-distance-10': 1, 'mention-distance-2': 1}\n",
      "{'mention-distance-2': 1, 'token-distance-7': 1}\n",
      "{'token-distance-10': 1, 'mention-distance-10': 1}\n"
     ]
    }
   ],
   "source": [
    "print coref_features.distance_features(all_markables[7],0,0)\n",
    "print coref_features.distance_features(all_markables[7],0,1)\n",
    "print coref_features.distance_features(all_markables[7],0,2)\n",
    "print coref_features.distance_features(all_markables[7],1,3)\n",
    "print coref_features.distance_features(all_markables[7],0,30)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.6** Implement `coref_features.make_feature_union`, which should take a list of feature functions, and return a function that computes the union of all features in the list. You can assume the feature functions don't use the same name for any feature. (*0.25 points*)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "reload(coref_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "joint_feats1 = coref_features.make_feature_union([coref_features.minimal_features,\n",
    "                                                  coref_features.distance_features])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'token-distance-6': 1, 'mention-distance-2': 1}\n",
      "{'token-distance-10': 1, 'mention-distance-3': 1}\n",
      "{'mention-distance-7': 1, 'token-distance-10': 1, 'last-token-match': 1}\n",
      "{'new-entity': 1.0}\n"
     ]
    }
   ],
   "source": [
    "print joint_feats1(all_markables[3],1,3)\n",
    "print joint_feats1(all_markables[3],0,3)\n",
    "print joint_feats1(all_markables[3],0,7)\n",
    "print joint_feats1(all_markables[3],10,10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1433 1403 1410 1437 1401 1437 1401 1437 1401 1437\n"
     ]
    }
   ],
   "source": [
    "theta_hist = coref_learning.train_avg_perceptron(all_markables,joint_feats1,N_its=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6977\tR: 0.5893\tP:0.8551\n",
      "F: 0.6941\tR: 0.5792\tP:0.8659\n",
      "F: 0.6943\tR: 0.5787\tP:0.8675\n",
      "F: 0.6945\tR: 0.5783\tP:0.8691\n",
      "F: 0.6945\tR: 0.5783\tP:0.8691\n",
      "F: 0.6950\tR: 0.5787\tP:0.8698\n",
      "F: 0.6950\tR: 0.5787\tP:0.8698\n",
      "F: 0.6950\tR: 0.5787\tP:0.8698\n",
      "F: 0.6950\tR: 0.5787\tP:0.8698\n",
      "F: 0.6950\tR: 0.5787\tP:0.8698\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables,theta_hist,joint_feats1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretty much the same on training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6918\tR: 0.5998\tP:0.8171\n",
      "F: 0.6887\tR: 0.5913\tP:0.8246\n",
      "F: 0.6893\tR: 0.5901\tP:0.8285\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n",
      "F: 0.6883\tR: 0.5889\tP:0.8282\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables_dev,theta_hist,joint_feats1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Better on dev."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.7** Implement `coref_features.make_feature_product`, which should take a list of feature functions, and return a function that computes the product of the feature functions. Desired behavior:\n",
    "\n",
    "- $f_1 = (i,x_i), (j, x_j)$\n",
    "- $f_2 = (m,x_m), (n, x_n) $\n",
    "- $f_1 \\times f_2 = ((i,m),x_i \\times x_m), ((i,n),x_i \\times x_n), ((j,m), x_j \\times x_m), ((j,n), x_j \\times x_n)$\n",
    "\n",
    "The product of features \"feat1\" and \"feat2\" should have the name \"feat1-feat2\", as shown in the example below. \n",
    "\n",
    "(*0.25 points*)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "reload(coref_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "prod_feats1 = coref_features.make_feature_cross_product(coref_features.minimal_features,\n",
    "                                                        coref_features.distance_features)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'exact-match': 1, 'last-token-match': 1, 'content-match': 1}\n",
      "{'token-distance-10': 1, 'mention-distance-10': 1}\n",
      "{'content-match-mention-distance-10': 1, 'exact-match-mention-distance-10': 1, 'content-match-token-distance-10': 1, 'last-token-match-mention-distance-10': 1, 'last-token-match-token-distance-10': 1, 'exact-match-token-distance-10': 1}\n"
     ]
    }
   ],
   "source": [
    "print coref_features.minimal_features(all_markables[7],3,14)\n",
    "print coref_features.distance_features(all_markables[7],3,14)\n",
    "print prod_feats1(all_markables[7],3,14)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's try a combined feature set, which includes the union of the product features and the original features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "feats = coref_features.make_feature_union([coref_features.minimal_features,\n",
    "                                           coref_features.distance_features,\n",
    "                                           prod_feats1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1416 1418 1390 1392 1412 1405 1367 1413 1392 1384\n"
     ]
    }
   ],
   "source": [
    "theta_hist = coref_learning.train_avg_perceptron(all_markables,feats,N_its=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6910\tR: 0.5748\tP:0.8661\n",
      "F: 0.6922\tR: 0.5761\tP:0.8670\n",
      "F: 0.6892\tR: 0.5704\tP:0.8705\n",
      "F: 0.6888\tR: 0.5695\tP:0.8715\n",
      "F: 0.6897\tR: 0.5704\tP:0.8722\n",
      "F: 0.6888\tR: 0.5690\tP:0.8726\n",
      "F: 0.6863\tR: 0.5646\tP:0.8747\n",
      "F: 0.6886\tR: 0.5704\tP:0.8687\n",
      "F: 0.6920\tR: 0.5756\tP:0.8675\n",
      "F: 0.6922\tR: 0.5761\tP:0.8670\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables,theta_hist,feats);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.6827\tR: 0.5852\tP:0.8190\n",
      "F: 0.6859\tR: 0.5889\tP:0.8212\n",
      "F: 0.6828\tR: 0.5816\tP:0.8265\n",
      "F: 0.6823\tR: 0.5804\tP:0.8276\n",
      "F: 0.6813\tR: 0.5804\tP:0.8247\n",
      "F: 0.6790\tR: 0.5780\tP:0.8227\n",
      "F: 0.6766\tR: 0.5744\tP:0.8232\n",
      "F: 0.6752\tR: 0.5768\tP:0.8140\n",
      "F: 0.6784\tR: 0.5816\tP:0.8139\n",
      "F: 0.6784\tR: 0.5816\tP:0.8139\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables_dev,theta_hist,feats);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This doesn't help much in this case, but you may find it useful in the bakeoff."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Deliverable 3.8** (7650 only; 4650 optional)\n",
    "\n",
    "To match nominals, it is often necessary to capture semantics. Find a paper (in ACL, NAACL, EACL, or TACL, since 2007) that attempts to use semantic analysis to do nominal coreference, and explain:\n",
    "\n",
    "- What form of semantics they are trying to capture (e.g., synonymy, hypernymy, predicate-argument, distributional)\n",
    "- How they formalize semantics into features, constraints, or some other preference\n",
    "- How much it helps\n",
    "\n",
    "Put your answer in `text-answers.md` (1 point)\n",
    "\n",
    "As usual, if you are in 4650 and you do this problem, you will be graded on the 7650 rubric."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Final bakeoff!\n",
    "\n",
    "Ideas for additional features:\n",
    "\n",
    "- Large-margin training\n",
    "- Cost-sensitive training to balance precision and recall\n",
    "- Syntax (you can parse all the markables as a preprocessing step)\n",
    "    - Tree distance\n",
    "    - Syntactic parallelism\n",
    "    - Better head matching\n",
    "- Word vector matching\n",
    "- Neural representations of each entity (Wiseman et al 2016)\n",
    "- Multilayer perceptron for mention ranking\n",
    "\n",
    "Feel free to search the research literature (via Google scholar) to get ideas. If you use an idea from another paper, mention the paper (authors, title, and URL) in your comments in `coref_features.py`\n",
    "\n",
    "**Deliverable 3.9**. Run the code blocks below to output predictions for both the dev and test sets. Note that `theta_hist` contains the history weights over all training epochs. You don't have to use the final set of weights for your output.\n",
    "\n",
    "Scoring:\n",
    "\n",
    "- Dev F1 > .71: +0.25 points\n",
    "- Dev F1 > .72: +0.25 points\n",
    "- Dev F1 > .73: +0.25 point\n",
    "- Test F1 > .7: +0.25 points \n",
    "\n",
    "The test set threshold is a low bar if you pass the dev tests and without badly overfitting.\n",
    "\n",
    "Extra credit (evaluated on test set)\n",
    "\n",
    "- Best in 4650: +0.5 points\n",
    "- Best in 7650: +0.5 points\n",
    "- Better than best TA/prof system: +0.5 points"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "reload(coref_features);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# writing a function to make the bakeoff features can be convenient\n",
    "# but you can define them directly if you want\n",
    "# we are only evaluating the outputs\n",
    "bakeoff_feats = coref_features.make_bakeoff_features()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1160 1112 1100 1097 1082 1090 1079 1093 1084 1061\n"
     ]
    }
   ],
   "source": [
    "theta_hist = coref_learning.train_avg_perceptron(all_markables,bakeoff_feats,N_its=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.7354\tR: 0.6631\tP:0.8254\n",
      "F: 0.7330\tR: 0.6645\tP:0.8172\n",
      "F: 0.7329\tR: 0.6653\tP:0.8156\n",
      "F: 0.7353\tR: 0.6702\tP:0.8145\n",
      "F: 0.7338\tR: 0.6711\tP:0.8095\n",
      "F: 0.7365\tR: 0.6737\tP:0.8123\n",
      "F: 0.7362\tR: 0.6737\tP:0.8114\n",
      "F: 0.7378\tR: 0.6750\tP:0.8135\n",
      "F: 0.7368\tR: 0.6746\tP:0.8116\n",
      "F: 0.7373\tR: 0.6763\tP:0.8103\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables,theta_hist,bakeoff_feats);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F: 0.7508\tR: 0.6832\tP:0.8333\n",
      "F: 0.7581\tR: 0.6917\tP:0.8387\n",
      "F: 0.7510\tR: 0.6892\tP:0.8249\n",
      "F: 0.7479\tR: 0.6868\tP:0.8208\n",
      "F: 0.7464\tR: 0.6868\tP:0.8173\n",
      "F: 0.7469\tR: 0.6868\tP:0.8184\n",
      "F: 0.7480\tR: 0.6892\tP:0.8178\n",
      "F: 0.7477\tR: 0.6880\tP:0.8187\n",
      "F: 0.7465\tR: 0.6856\tP:0.8194\n",
      "F: 0.7485\tR: 0.6892\tP:0.8190\n"
     ]
    }
   ],
   "source": [
    "coref_learning.eval_weight_hist(all_markables_dev,theta_hist,bakeoff_feats);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# run this block to output your predictions\n",
    "coref.write_predictions(coref_learning.make_resolver(bakeoff_feats,\n",
    "                                                    theta_hist[1]),\n",
    "                        all_markables_dev,\n",
    "                        'predictions/bakeoff-dev.preds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# run this block to output your predictions\n",
    "coref.write_predictions(coref_learning.make_resolver(bakeoff_feats,\n",
    "                                                    theta_hist[1]),\n",
    "                        all_markables_te,\n",
    "                        'predictions/bakeoff-te.preds')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12+"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
